Introduction
Definitions
Before proceeding to study the actual subject of
data structures, it is necessary to make ourselves aware of the various terms
used in the subject. In the present section, the basic terminologies and concepts
are, therefore, described with examples to get the readers started.
Data
The term data means a value or a set of values. In this book the
term data is used in the context of both singular and plural forms. The
following are some examples of data:
(i) 34
(ii) 12/01/1965
(iii) ISBN
81-203-0000-0
(iv) | | |
(v) Pascal
(vi) Æ
(vii) 21, 25, 28, 30, 35, 37, 38, 41, 43
In all
the above examples, some values are represented in different ways. Each value
or a collection of all such values is termed data.
Entity
An entity is one that has certain attributes and which may be
assigned values. For example, an employee in an organisation
is an entity. The possible attributes and their corresponding values for an
entity in the present example are:
Entity : EMPLOYEE
Attributes : NAME DOB SEX DESIGNATION
Values : RAVI ANAND 30/12/61 M DIRECTOR
All the employees in an organization constitute an entity
set. Each attribute of an entity set has a range of values, and this range
is called the domain of the attribute. A domain is, thus, the set of all
possible values that could be assigned to a particular attribute. For example,
in the EMPLOYEE entity, the attribute SEX has the domain {M, F}. It may be
noted that an entity after values having been assigned to it, results in a
composite data.
Information
The term information is used for data with its attribute(s). In
other words, information can be defined as meaningful data or processed data.
For example, the set of data as presented earlier becomes information related
to a person if we impose the meanings as mentioned below.
Data Meaning
34 Age of
the person
12/01/1965 Date of birth of the
person
ISBN 81-203-0000-0 Book number,
recently published by the person
| | | Number
of awards equal to the tally marks achieved by
the person
Pascal Nickname of
the person
Æ Signature
of the person
21, 25,
28, 30, 35, 37, 38, 41, 43 Important ages of the person
Difference between data and
information
From the definition of data and information, it is
evident that these two are not the same, however, there is a relation between
them. Figure 1.1 shows the interrelation between data and information.
As an example, suppose there is a set of data
consisting of the amount of milk consumed by a person in a month. From this
given set of data, information can be retrieved as follows:
(i) What was the total
amount of milk consumed in the month?
(ii) On which day was the maximum amount of milk
consumed?
(iii) On which day was the minimum amount of milk
consumed?
(iv) What was the
average amount of milk consumption per day?
(v) What was the amount of carbohydrates
assimilated?
(vi) What was the
amount of proteins assimilated?
(vii) What was the amount of fats assimilated?
(viii) What was the amount of minerals assimilated?
(ix) What amount of average calories gained per
day?
(x) And so on.
To get
all this information from the given set of data, we need to define certain
processes and then apply the corresponding process on the data. (In fact, the
terms data, entity and information have no standard
definition in computer science and they are often used interchangeably.)
Figure 1.1 Relation
between data and information.
Data type
A data type is a term which refers to the kind of data that may
appear in a computation. The following are a few well-known data types:
Data Data type
34 Numeric
(integer)
12/01/1965 Date
ISBN
81-203-0000-0 Alphanumeric
|
| | Graphics
Pascal String
Æ Image
21,
25, 28, 30, 35, 37, 38, 41, 43 Array
of integers
Real, boolean,
character, complex, etc. are also some more frequently used data types.
Built-in data type
In every programming language, there is a set of data types called built-in
data types. For example, in C, FORTRAN, and Pascal, the data types that are
available as built-in are listed below:
C : int, float, char, double, enum, etc.
FORTRAN : INTEGER,
REAL, LOGICAL, COMPLEX, DOUBLE PRECISION, CHARACTER, etc.
Pascal : Integer, Real, Character, Boolean, etc.
With the built-in data types, programming languages
provide users with a lot of advantages of processing of various types of data.
For example, if a user declares a variable of type Real (say), then several
things are automatically implied, such as how to store a value for that
variable, what are the different operations possible on that type of data, what
amount of memory is required to store, etc. All these things are taken care of
by the compiler or the run-time system manager.
Abstract data type
When an application requires a special kind of data
which is not available as a built-in data type, then it is the programmer’s
responsibility to implement his own kind of data. Here, the programmer has to
specify how to store a value for that data, what are the operations that can
meaningfully manipulate variables of that kind of data, amount of memory
required to store a variable. The programmer has to decide all these things and
accordingly implement them. Programmers’ own data type is termed abstract
data type. The abstract data type is also alternatively termed user-defined
data type. For example, suppose we want to process dates of the form dd/mm/yy. For this, no built-in
data type is known in C, FORTRAN, and Pascal. If a programmer wants to process
dates, then an abstract data type, say Date, has to be devised and various operations
such as adding a few days to a date to obtain another date, finding the days
between two dates, obtaining a day for a given date, etc. have to be defined
accordingly. Besides these, programmers should decide how to store the data,
what amount of memory will be needed to represent a value of Date, etc. An
abstract data type, in fact, can be built with the help of built-in data types
and other abstract data type(s) already built by the programmer. Some
programming languages provide a facility to build abstract data types easily.
For example, using struct/class in C/C++, and using record in Pascal,
programmers can define their own data types.
Concept of Data Structures
A digital computer can manipulate only primitive data, that is, data in
terms of 0’s and 1’s. Manipulation of primitive data is inherent within the
computer and does not require any extra effort on the part of the user. But in
our real-life applications, various kinds of data other than the primitive data
are involved. Manipulation of real-life data (also termed user data) requires
the following essential tasks:
1. Storage representation of user data: User
data should be stored in such a way that the computer can understand it.
2. Retrieval of stored data: Data
stored in a computer should be retrieved in such a way that the user can
understand it.
3. Transformation of user data: Various
operations which require to be performed on user data so that it can be
transformed from one form to another.
The basic
theory of computer science deals with the manipulation of various kinds of
data, wherefrom the concept of data structures comes in. In fact, data
structures constitute the fundamentals of computer science. For a given kind of
user data, its structure implies the following:
1. Domain (D): This is the range of values that the data may have. This domain
is also termed data object.
2. Function (F): This is the set of operations which may legally be applied to
elements of the data object. This implies that for a data structure, we must
specify the set of operations.
3. Axioms (A): This is the set of rules with which the different operations
belonging to F can actually be implemented.
Now we can define the term data structure.
A data
structure D is a triplet, that is, D = (D, F, A) where D is a set of data object, F is a set of functions and A is a set of rules to implement the functions. Let us consider an
example.
We know that for the integer data type (int) in the
C programming language the structure includes the following types:
D = (0, +1,
+2, +3, ...)
F = (+, –, *, /, %)
A = (A set of binary arithmetics
to perform addition, subtraction, division, multiplication, and modulo
operations.)
It can be easily recognized that the triplet (D, F, A) is nothing
but an abstract data type. Also, the elements in set D are not necessarily from primitive data; it may contain elements from
some other abstract data types. Alternatively, an implementation of a data
structure D is a mapping from D to a set of other data structures
D i, i = 1, 2, ..., n,
for some n. More precisely, this mapping specifies how every object of D is to be represented by the objects of D i, i = 1, 2, ..., n. Every function of D must be written using the function of the
implementing data structures D i, i = 1, 2, ..., n.
The fact is that each of the implementing data structures is either a primitive
data type or an abstract data type. We can conclude the discussion with another
example.
Suppose, we want to implement a data type, namely Complex as an abstract data type. Any variable of the complex data type has two
parts: a real part and an imaginary part. In our usual notation, if z is
a complex number, then z = x + iy,
where x and y are the real and imaginary parts, respectively.
Both x and y are of the Real data type which is another
abstract data type (available as a built-in data type). So, the abstract data
type Complex can be defined using the data structure Real as
Complex z {
x : Real
y : Real
}
Now the
set D of Complex can be realized from the domain of x and y
which is Real in this case. Let us specify the set of operations
for the Complex data type, which are stated as F:
F = (Å, –, Ä, ¸, Ñ, || )
Assume that z1 = x1 + iy1 and z2 = x2 + iy2 are two data
of the Complex data type. Then we can define the rules for
implementing the operations in F, thus
giving axioms A. In the current example, for the Complex data type
A = {
z = z1 Å z2 = (x1 + x2) + i(y1 + y2) (complex
addition)
z = z1 – z2 = (x1 – x2) + i(y1 – y2) (complex
subtraction)
z = z1 Ä z2 = (x1 ´ x2 – y1 ´ y2) + i(x1 ´ y2 + x2 ´ y1) (complex
multiplication)
z = z1 ¸ z2
(complex
division)
z = Ñz1 (complex
conjugate)
z = | z1 | = (complex
magnitude)
}
Note that how different operations of the Complex data type can be implemented using the operations +, –, ´, /, of the implementing data structure, namely Real.
Overview of Data Structures
In computer science, several data structures are
known depending on the areas of application. Out of them, a few data structures
are frequently used in almost all application areas and with the help of which
almost all complex data structures can be constructed. These data structures
are known as fundamental data structures or classic data
structures. Figure 1.2 gives a classification of all classic data
structures.
Figure 1.2 Classification of classic data structures.
In
addition to these classic data structures, other data structures such as
lattice, Petri nets, neural nets, search graphs, semantic nets, etc. are known
in various applications. These are known to be very complex data structures.
(The discussion of all these complicated data structures is beyond the scope of
this book.)
It can be
observed from Figure 1.2 that all the classic data structures are classified
into two main classes: linear data structures and nonlinear data
structures. In case of linear data structures, all the elements form a
sequence or maintain a linear ordering. On the other hand, no such sequence in
elements exists, rather all the elements are distributed over a plane, in case
of nonlinear data structures. Again within each classical data structure there
are a number of variations as depicted in Figure 1.2. In this book, we have
planned eight chapters to discuss eight classic data structures, one type in
each chapter. As an overview, all the classic data structures discussed in this
book can be seen in Figure 1.3.
Figure 1.3 Overview
of classic data structures.
Implementation of Data Structures
We will implement all the classic data structures
as mentioned in the previous section in two phases:
Phase 1
Storage representation: Here we will decide how a data
structure can be stored in a computer memory. This storage representation, in
general, is based on the use of other data structures.
Phase 2
Algorithmic notation of various operations of the
classic data structure: A
function for manipulating a data structure is expressed in terms of algorithms
so that the details of the operation can be understood easily and the reader
can implement them with the help of any programming language. It will be
preferable to use C/C++ programming because of their versatile features and
modern programming approaches.
In order
to express the algorithm for a given operation, we will assume different
control structures and notations, as shown in Figure 1.5.
Different
statements that will be used in the algorithms are very much similar to the
statements in the C language. For example, there are some basic constructs in
the C language such as, if (condition) then (statement1) else
(statement2), while (condition) do, for (initialization,
condition, update) do which are considered. All statements in the
algorithm are uniquely identified by their step numbers, so that each of them
can be easily referred to in our subsequent discussions. The final step in an
algorithm, which is a Stop statement, indicates the end of the
algorithm. Sufficient comments will be introduced side by side against
statements to indicate what step a statement will perform. The convention of
putting comments is similar to the convention of placing comments in the C/C++
programming language. All comments are
made small and precise so that they exactly signify the step(s).
Another
convention that we will assume is the naming of variables. The variable names
are either in capital letters or in small letters. The variable names in capital
letters will be treated either as constant or as input to the algorithm. All
the variable names in small letters are local to the algorithm and will be
treated as temporary variables.
The call of an operation (that is a function) is
represented with boldface letters, and usually already defined in the book or
else defined subsequently.
Algorithm <Name of the operation>
Input: <Specification of input data for the operation>
Output: <Specification of output after the successful performance
of the operation>
Remark: <If the operation assumes other data structure for its
implementation or something important>
Steps: 1 .
. . . . . . . . . . . . . . . . 2 If <condition> then // Comment on this step, if any
is applicable 3 .
. . . . . . . . . . . . . . . 4 .
. . . . . . . . . . . . . . . 5 Else 6 .
. . . . . . . . . . . . . . . . . . 7 .
. . . . . . . . . . . . . . . . . 8 EndIf 9 .
. . . . . . . . . . . . . . . . . . 10 .
. . . . . . . . . . . . . . . . . . 11 While
<condition> do 12 .
. . . . . . . . . . . . . . . . . 13 .
. . . . . . . . . . . . . . . . . 14 EndWhile 15 .
. . . . . . . . . . . . . . . . . 16 .
. . . . . . . . . . . . . . . . . /* Comment on the following few
steps, if any is applicable */ 17 For
<loop condition> do 18 .
. . . . . . . . . . . . . . . . . 19 .
. . . . . . . . . . . . . . . . . 20 EndFor 21 .
. . . . . . . . . . . . . . . . . 22 .
. . . . . . . . . . . . . . . . . 23 Stop |
Figure 1.5 Format of
presenting an algorithm for an operation.